2) Were there specific days of the week that saw higher intake volumes?

When looking at all species together, Tuesday and Wednesday have slightly higher volumes than other days.

# add the variable
foo$weekday <- weekdays(foo$intake_date)
foo$weekday_ord <- factor(foo$weekday,                                    # Change ordering manually
                          levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))

# function for plotting
day_volume <- function(foo){
  
  plot_data <- foo %>% count(weekday_ord) 
  
  ggplot(plot_data) + geom_col(aes(x = weekday_ord, y=n)) +
    labs(x = 'Day', y = 'Count')+
    geom_text(aes(x = weekday_ord, label = sprintf("%.f", n), y= n), vjust=2, colour="white", size=4)
}

day_volume(foo)

When looking at dogs only, they are still higher, but the trend is not apparent. Sunday is relatively lower.

day_volume(foo %>% filter(species == 'Dog'))

Which suggest that for cats we will see it more distinctly, as indeed this figure shows. More specifically, Wednesday is cat day (Thursday following), as Tuesday is calmer than all other days.

day_volume(foo %>% filter(species == 'Cat'))

3) Where are there holes in the data?

hint: think about providing an analysis that a shelter operations director might be able to use to try and tell how staff are doing with proper data input.

The missing values, in decreasing order:

  1. Intake reason has 3511 missing values. 2370 of those are for strays (and the reason listed for 2276 of the 2335 strays with a reason listed is ‘STRAY’), but 1141 non-stray animals, whose reason entries are usually informative, are still missing.
  2. Intake subtype follows with 698 NAs. 398 of those are strays, but other strays do have an informative value here (mostly Field/OTC). Also, it seems like there are many common values between reason and intake_subtype (for example, eviction, owner died, Covid) which suggest some standardization could be useful there (decide what counts as a subtype and what counts as a reason, and make these categories mutually exclusive). Separating the categories and improving their data input is the main finding here.
  3. Found ZIP code has 280 NAs, whereas Finder has 160 values of ‘0’, which is better changed to a blank or ‘Unknown’ value for consistency.
  4. Date of birth has 34 NAs.

4) Surprise us! Using the data, please provide a visualization that gives a unique insight into the data.

Here’s a heat-map showing the number of Found animals per ZIP code! A few ZIP codes in the center of the city stand out and numbers relatively decrease in the outskirts. 85705 and 85706 also stand out as fairly smaller ZIP codes with high intakes (although presumably with a denser population). 114 animals also came from way outside town (85321 - a different shelter?).

# load geometry
geometry <- readRDS('zips.rds')

# just counts
finder_count <- foo %>% filter(src_finders_zip_code!=0) %>% group_by(zip=src_finders_zip_code) %>% 
  summarise(count = length(src_finders_zip_code), .groups='keep')

found_count <- foo %>% group_by(zip=src_found_zip_code) %>% 
  summarise(count = length(src_found_zip_code), .groups='keep')

# Merge the counts and name properly
countDF <- inner_join(finder_count,found_count, by='zip')
colnames(countDF) = c('zip', 'countFinder', 'countFound')
countDF$zip = as.character(countDF$zip)

count_sf <- geometry %>% inner_join(countDF, by = "zip")


fix_sf <- function(old_sf){
  new_sf <- old_sf %>% st_transform(4326)
  names(st_geometry(new_sf)) = NULL
  
  #return (old_sf) # raises warnings that suggests to do the below lines, but still works
   return(st_transform(old_sf, '+proj=longlat +datum=WGS84')) # this works well locally but fails there
  #return (new_sf) # this works without warning locally, but not on shinyapps
}  


pal <- colorBin(palette='Purples', domain = count_sf$countFound, bins = c(0, 50, 100, 200, 500, 1000))
label <- sprintf("<strong>%s</strong><br/>%g %s", count_sf$zip, count_sf$countFound, 'Found Animals') %>% 
  lapply(htmltools::HTML)




leaflet() %>%
  addTiles() %>%
  setView(lat = 32.2239217, lng = -110.917225, zoom=8) %>% 
  addPolygons(data=fix_sf(count_sf), group='Found', fillColor=~pal(countFound),
              fillOpacity = 0.7, color='grey', weight = 1, opacity = 0.4, label = label,
              highlightOptions = highlightOptions(color = "black",weight = 2, bringToFront = TRUE)) %>%
  addLegend(pal = pal, values = count_sf$countFound, opacity = 0.7, title = 'Found Animals',
                            position = "bottomright", group='Found')